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Abstract 

In  this  paper,  we  propose  a  low -rank  sparse  coding 
(LRSC)  method  that  exploits  local  structure  information  a- 
mong  features  in  an  image  for  the  purpose  of  image-level 
classification.  LRSC  represents  densely  sampled  SIFT  de¬ 
scriptors,  in  a  spatial  neighborhood,  collectively  as  low- 
rank,  sparse  linear  combinations  of  codewords.  As  such,  it 
casts  the  feature  coding  problem  as  a  low -rank  matrix  learn¬ 
ing  problem,  which  is  different  from  previous  methods  that 
encode  features  independently.  This  LRSC  has  a  number  of 
attractive  properties.  (1)  It  encourages  sparsity  in  feature 
codes,  locality  in  codebook  construction,  and  low -rankness 
for  spatial  consistency.  (2)  LRSC  encodes  local  features 
jointly  by  considering  their  low -rank  structure  information, 
and  is  computationally  attractive.  We  evaluate  the  LRSC  by 
comparing  its  performance  on  a  set  of  challenging  bench¬ 
marks  with  that  of  7  popular  coding  and  other  state-of-the- 
art  methods.  Our  experiments  show  that  by  representing  lo¬ 
cal  features  jointly,  LRSC  not  only  outperforms  the  state-of- 
the-art  in  classification  accuracy  but  also  improves  the  time 
complexity  of  methods  that  use  a  similar  sparse  linear  repre¬ 
sentation  model  for  feature  coding  [36]. 

1.  Introduction 

The  bag-of-words  (BoW)  model  is  one  of  the  most  pop¬ 
ular  models  for  feature  design.  It  has  been  successfully 
applied  to  classical  computer  vision  applications,  includ¬ 
ing  scene  classification  [22],  image-level  object  recogni¬ 
tion  [9,  13],  and  action  recognition  [23].  The  convention¬ 
al  BoW  pipeline  for  classification  consists  of  five  stages: 
feature  extraction  and  description,  codebook  design,  feature 
coding,  feature  pooling,  and  classifier  construction.  Recent¬ 
ly,  different  approaches  have  been  proposed  to  improve  the 
generative  property  of  BoW,  that  helps  it  accurately  represent 
images  as  well  as  its  discriminative  power  for  classification. 
Despite  remarkable  progress  in  this  field,  there  exists  signifi¬ 
cant  room  for  improvement,  especially  in  how  local  features 
are  encoded  in  an  image. 


Given  an  image,  features,  such  as  SIFT  [27],  HOG  [7]  and 
SURF  [2],  can  be  densely  extracted  and  encoded  with  a  code¬ 
book  constructed  using  K-means  clustering.  Recently,  many 
different  feature  coding  methods  have  been  proposed  includ¬ 
ing  hard-assignment  coding  (HC)  [22],  soft- assignment  cod¬ 
ing  (SC*)  [33],  localized  soft-assignment  coding  (LSC)  [25], 
sparse  coding  (SCSPM)  [36],  locality-constrainted  linear 
coding  (LLC)  [18],  Laplacian  sparse  coding  (LScSPM)  [11], 
salient  coding  (SC)  [15],  and  locality-constrained  and  spa¬ 
tially  regularized  coding  (LCSRC)  [31].  After  computing 
codes  for  local  features,  they  need  to  be  pooled  together  to 
form  equal  sized  feature  vectors  each  representing  one  im¬ 
age  in  a  dataset.  Popular  pooling  methods  include  average 
pooling  (e.g.  histogram)  and  max-pooling  [36].  To  include 
the  spatial  layout  of  local  features  in  an  image,  Spatial  Pyra¬ 
mid  Matching  (SPM)  [22]  is  usually  performed  to  obtain  an 
image-level  representation  that  can  be  used  to  discriminate 
different  categories  of  objects,  scenes,  or  actions.  Using  this 
BoW  representation,  images  can  be  classified  using  a  pletho¬ 
ra  of  discriminative  models  such  as  SVM  or  Boosting. 

Recent  work  shows  that  given  a  visual  codebook,  the 
method  of  encoding  local  features  has  significant  impact 
on  classification  performance.  The  earliest  method  is 
hard-assignment  coding  (vector  quantization)  [22],  a  voting 
scheme  that  is  simple  yet  highly  sensitive  to  the  selection  of 
codebook.  A  more  robust  voting  approach  is  soft- assignment 
coding  [33],  which  assigns  a  code  coefficient  for  a  particular 
local  feature  to  each  visual  word  according  to  their  pairwise 
distance.  To  improve  hard  and  soft-assignment  coding,  s- 
parsity  is  enforced  on  local  feature  codes  via  sparse  learning 
techniques  [36].  However,  sparse  coding  is  time  consuming 
and  usually  leads  to  non-consistent  codes  [18,  11],  i.e.  local 
features  with  similar  descriptors  tend  to  have  different  sparse 
codes.  To  alleviate  inconsistency,  authors  in  [37]  introduce 
another  coding  property,  called  locality,  which  encourages 
that  visual  words  used  to  represent  a  local  feature  be  simi¬ 
lar  to  the  feature’s  descriptor  itself.  This  is  usually  ensured 
by  constructing  a  feature’s  codebook  from  its  nearest  neigh¬ 
bors  in  the  universal  codebook.  In  fact,  several  implementa- 
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Figure  1.  Observing  the  sparse  low-rank  property  of  feature  descriptors  in  natural  images,  (a)  Image  segmented  into  superpixels;  (b)  All  SIFT  descriptors 
densely  sampled  in  (a)  concatentated  in  matrix  form;  (d)  SIFT  descriptors  from  the  local  region  depicted  in  red  color  in  (a);  From  Figure  (b),  we  see  that  SIFT 
descriptors  in  the  image  tend  to  be  sparse  and  low-rank.  This  observation  holds  over  thousands  of  natural  images,  where  the  histogram  of  the  rank  of  their  SIFT 
descriptors  is  shown  in  (c).  Clearly,  the  average  rank  (68)  in  an  image  is  much  smaller  than  its  maximum  (128),  where  each  image  usually  contains  thousands 
of  SIFT  features.  Moreover,  this  observation  transfers  locally  to  superpixels  within  the  image.  The  histogram  of  the  rank  of  SIFT  descriptors  in  each  superpixel 
in  (a)  is  plotted  in  (e).  Again,  the  average  rank  (18)  is  smaller  than  the  average  number  (30)  of  SIFT  features  in  each  superpixel. 


tions  of  locality  have  been  proposed  in  [18,  25,  15],  where 
each  descriptor  is  coded  on  locally  selected  bases.  The  work 
in  [15]  rebrands  the  locality  property  as  codebook  ‘saliency’. 
However,  all  the  aforementioned  coding  schemes  encode  lo¬ 
cal  features  independently.  In  Laplacian  sparse  coding  [11], 
a  global  similarity  between  local  features  is  considered  to 
constrain  sparsity;  however,  this  method  is  not  only  compu¬ 
tationally  infeasible  for  large  sets  of  features,  but  it  also  dis¬ 
regards  the  relationship  between  sparse  codes  and  the  spatial 
layout  (or  context)  of  their  corresponding  features.  The  prop¬ 
erty  of  spatial  consistency  encourages  local  features,  which 
are  spatially  close  in  an  image,  to  have  similar  sparse  codes 
and  similar  supports.  The  latter  implication  encourages  code 
consistency  among  features  and  suggests  that  the  same  visual 
words  represent  each  local  feature  in  a  spatial  neighborhood. 
Very  little  work  has  exploited  this  property  in  feature  coding. 
In  fact,  only  recently,  the  spatial  layout  of  local  features  has 
been  used  to  select  ‘optimal’  visual  words  for  each  feature 
in  an  image  [31].  This  optimal  selection  is  formulated  as  a 
labeling  problem  with  a  pairwise  multi-label  MRF  energy  to 
be  minimized.  Despite  its  awareness  of  spatial  layout,  the 
spatial  consistency  property  is  only  invoked  in  the  codebook 
selection  process,  which  is  done  independently  from  the  cod¬ 
ing  itself.  Although  features  in  a  spatial  neighborhood  are 
encouraged  to  have  the  same  set  of  visual  words  represent¬ 
ing  them,  their  sparse  codes  are  not  directly  encouraged  to  be 
similar  or  have  similar  supports  w.r.t  their  ‘optimal’  bases. 

As  stated  before,  maintaining  spatial  consistency  among 
feature  codes  enables  a  more  faithful  representation  of  an 
image  and  has  been  shown  to  improve  classification  perfor¬ 
mance.  However,  many  conceivable  ways  of  enforcing  such 
consistency  exist.  They  tend  to  stem  from  empirical  observa¬ 
tions  made  about  spatial  relationships  between  local  features 
in  natural  images.  In  fact,  we  observe  that  descriptors  of  de¬ 
tected  SIFT  points  in  the  same  image  are  not  independent 
and  do  exhibit  a  dependency  relationship.  Figure  1  shows  an 
example  of  this  observation.  In  Figure  1(b),  all  SIFT  descrip¬ 


tors  (E  M128)  in  the  image  are  concatenated  in  matrix  form. 
This  matrix  tends  to  be  sparse  and  low-rank  (refer  to  Fig¬ 
ure  1(d)).  In  Figure  1(c),  we  plot  the  histogram  of  the  rank  of 
this  SIFT  matrix  over  thousands  of  natural  images,  each  con¬ 
taining  thousands  of  dense  SIFT  features.  As  can  be  expect¬ 
ed,  the  average  rank  of  this  matrix  (68)  is  much  smaller  than 
its  maximum  possible  rank  (128).  This  low-rank  sparsity  ob¬ 
servation  is  more  obvious  locally  within  the  same  image.  By 
dividing  the  image  into  superpixels  as  shown  in  Figure  1(a), 
we  observe  that  the  matrix  of  descriptors  for  SIFT  points  in  a 
particular  superpixel  (denoted  in  red)  is  also  sparse  and  low- 
rank  as  shown  in  Figure  1(d).  In  Figure  1(e),  we  plot  the 
histogram  of  the  rank  of  SIFT  matrices  over  all  superpixels 
in  the  image.  Clearly,  the  average  rank  (18)  is  much  smaller 
than  the  average  number  of  SIFT  features  (30)  in  each  super¬ 
pixel.  Similar  observations  are  also  made  in  [20]. 

Inspired  by  the  observation  and  prior  work  on  feature  cod¬ 
ing,  we  propose  a  low-rank  sparse  coding  (LRSC)  method 
that  encourages  both  sparsity  and  spatial  consistency  in  the 
coding  step  of  the  BoW  model.  Here,  the  joint  coding  of  fea¬ 
tures  in  a  local  region  is  viewed  as  a  low-rank  sparse  learning 
problem.  Unlike  previous  methods,  we  exploit  similarities  a- 
mong  local  features  lying  in  the  same  spatial  neighborhood 
and,  therefore,  seek  an  accurate  joint  representation  of  these 
local  features  w.r.t.  a  codebook  that  satisfies  the  locality 
property.  In  LRSC,  the  codes  of  local  features  are  sparse  and 
low-rank,  which  encourages  that  only  a  few  (but  the  same) 
visual  words  are  used  to  represent  all  features  in  a  local  re¬ 
gion.  As  opposed  to  sparse  coding  based  image  classifica¬ 
tion  methods  [36,  18]  that  handle  local  features  independent¬ 
ly,  our  use  of  sparse  low-rank  learning  realizes  the  benefits 
of  a  sparse  feature  representation,  while  respecting  the  un¬ 
derlying  spatial  relationship  among  local  features.  Feature 
codes  are  computed  by  solving  a  sparse  low-rank  optimiza¬ 
tion  problem,  which  comprises  a  sequence  of  closed  form 
update  steps  made  possible  by  the  Inexact  Augmented  La¬ 
grange  Multiplier  (IALM)  that  guarantees  fast  convergence. 
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Contributions:  The  contributions  of  this  work  are  three¬ 
fold.  (1)  We  propose  a  low-rank  sparse  learning  method  for 
feature  coding,  which  is  a  robust  sparse  coding  method  that 
mines  correlations  among  different  local  features  to  obtain 
better  coding  results  than  learning  each  feature  individual¬ 
ly.  To  the  best  of  our  knowledge,  this  is  the  first  work  to 
exploit  low-rank  sparse  learning  in  feature  coding.  (2)  We 
show  that  popular  sparse  coding  methods  [36,  18]  are  a  spe¬ 
cial  case  of  our  LRSC  formulation.  (3)  We  learn  local  feature 
codes  jointly  with  an  efficient  IALM  method.  As  a  result, 
LRSC  outperforms  state-of-the-art  coding  methods  in  gener¬ 
al,  while  remaining  computationally  attractive. 


2.  Related  Work 


In  this  section,  we  survey  commonly  used  coding 
schemes.  Let  denote  a  visual  word  in  the  codebook, 

where  d  is  the  dimensionality  of  a  local  feature.  The  total 


number  of  visual  words  is  n.  Matrix  B  =  ^bi,  b2,  •  •  •  ,  b 
denotes  a  visual  codebook  or  a  set  of  basis  vectors.  Let 
£  Rd  be  the  ith  local  feature  in  an  image.  Let  z i  £  Rn  be 
the  code  of  x^,  with  zy  being  the  coefficient  w.r.t.  word  b j. 


Hard-assignment  coding  (HC)  [22]:  For  a  local  feature  x^, 
there  is  one  and  only  one  nonzero  coding  coefficient.  It  cor¬ 
responds  to  the  nearest  visual  word  subject  to  a  predefined 
distance.  When  we  adopt  Euclidean  distance, 


1 

0 


if  j  =  arg  min 
otherwise 


2 
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Soft-assignment  coding  (SC*)  [33]:  The  j-th  coding  coeffi¬ 
cient  represents  the  degree  of  membership  of  a  local  feature 
xz  to  the  jth  visual  word,  where  a  is  the  smoothing  factor 
controlling  the  softness  of  the  assignment.  Note  that  all  n 
visual  words  are  used  in  computing  zy. 

z  =  exp(— ojjxj  -  bjjj|) 

U  ELiexp(-a||xi-bfc|||) 


farther  away  from  x*.  The  code  z)  is  computed  by  solving 
the  following  regularized  least- squares  program, 

z i  =  arg  min  ||x;  -  Bz i\\l  +  A  ||di  ©  Zi\\\ 

lTZi  =  l 

where  di  =  exp(<ii5f(xi,  B)/#),  distfei,  B)  = 
(distfei,  bi),  dist(x.i,  b2), . . . ,  bn))T,  and 

distfii.bj)  denotes  the  i 2  distance  between  x*  and 
each  bj .  S  is  used  for  adjusting  the  weight  decay  speed  for 
the  locality  adaptor.  In  [18],  an  approximation  is  proposed 
to  improve  its  computational  efficiency. 

Laplacian  sparse  coding  (LScSPM)  [11]:  It  is  the  first 
method  that  improves  the  consistency  of  sparse  coding,  by 
encouraging  similar  local  features  in  the  dataset  to  have  sim¬ 
ilar  sparse  codes.  This  is  done  by  adding  a  graph  regular¬ 
ization  term  to  the  LASSO.  Codebook  learning  and  sparse 
coding  are  done  iteratively. 

(B,  Z)  =  argmin  ||X  -  BZ||“  +  A]T\  ll^ll!  +  fitr( ZLZT) 

B,Z 

s.t.  || bj ||2  <1,  Vj  = 

Here,  L  is  the  Laplacian  of  the  graph  that  encodes  the  re¬ 
lationship  between  local  features.  Due  to  the  extremely  large 
number  of  features  in  a  dataset,  constructing  the  Laplacian 
matrix  and  learning  sparse  codes  simultaneously  is  compu¬ 
tationally  infeasible.  Some  heuristic  measures  are  taken  to 
moderately  improve  its  computational  complexity. 

Salient  coding  (SC)  [15]:  This  is  an  alternative  to  sparse 
coding.  It  exploits  codebook  locality  by  setting  the  code  to 
a  “saliency”  degree  based  on  the  nearest  codebook  bases  bj 
to  x^.  Here,  (p(.)  is  a  monotonically  decreasing  function  and 

fe  is  the  set  of  /^-nearest  bases  to  x$. 

£111  ^ 

£ lEU  Pi  -bmlll 


Localized  soft-assignment  coding  (LSC)  [25]:  The  basic 
idea  is  to  adopt  the  k  visual  words  in  the  neighborhood  of  a 
local  feature  to  refine  the  soft-assignment  coding  [33]. 


exp(— ajjxj  -  bjjjf) 


C  N /c(x^) 


Sparse  coding  (SCSPM)  [36]:  It  represents  a  local  feature 
X;  by  a  linear  combination  of  a  sparse  set  of  basis  vectors 
in  the  codebook.  The  coefficient  vector  z \  is  obtained  by 
solving  an  ^i-norm  regularized  problem, 

z i  =  argmin  ||x*  -  Bzi\\l  +  AHz^^ 


Locality-constrained  linear  coding  (LLC)  [18]:  Unlike  s- 
parse  coding,  LLC  enforces  codebook  locality  instead  of  s- 
parsity.  This  leads  to  smaller  coefficients  for  basis  vectors 


Locality-constrained  and  spatially  regularized  coding 

(LCSRC)  [31]:  The  spatial  layout  of  local  features  in  the 
same  image  is  used  to  select  “optimal”  bases  for  each  local 
feature.  It  assumes  that  local  feature  xp  should  have  similar 
bases  as  its  nearest  neighbors  x(/.  This  is  done  by  solving  a 
pairwise  multi-label  MRF  problem.  Once  bases  are  selected 
for  local  features,  their  codes  can  be  computed  by  using  any 
of  the  previous  coding  methods. 

Most  of  the  aforementioned  coding  schemes  (except  for 
LScSPM)  produce  feature  codes  independently.  Although 
LScSPM  [11]  adopts  a  global  similarity  between  local  fea¬ 
tures,  it  ignores  local  spatial  contextual  information  [16,  39, 
40]  and  is  computationally  expensive.  LCSRC  [31]  makes 
use  of  the  spatial  layout  of  local  features  in  the  same  image. 
However,  it  only  does  so  to  constrain  codebook  selection. 
It  fails  to  directly  enforce  consistency  on  codes  themselves. 
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To  the  best  of  our  knowledge,  the  proposed  low-rank  sparse 
coding  (LRSC)  method  is  the  first  to  introduce  spatial  con¬ 
sistency  and  joint  feature  coding  explicitly  in  the  coding  step 
of  the  BoW  model.  In  the  next  section,  we  provide  a  detailed 
description  of  LRSC  coding. 

3.  Low-Rank  Sparse  Coding  (LRSC) 

Here,  we  give  a  detailed  description  of  our  local  feature 
coding  method  that  makes  use  of  low-rank  sparse  learning. 

3.1.  Low-Rank  Sparse  Representation 

As  seen  in  Figure  1,  SIFT  descriptors  tend  to  be  collec¬ 
tively  sparse  and  low-rank  across  natural  images  and  specif¬ 
ically  in  spatial  neighborhoods  of  the  same  image.  However, 
many  existing  methods  [22,  36,  18,  25,  15]  ignore  this  infor¬ 
mation  and  encode  features  independently.  In  this  paper,  we 
formulate  local  feature  coding  as  a  low-rank  sparse  learning 
problem,  which  encourages  sparsity  and  low-rankness  local¬ 
ly  among  features  in  the  image.  Since  the  low-rank  sparsity 
property  is  more  evident  locally,  we  apply  low-rank  sparse 
learning  to  code  features  in  the  same  region  of  an  image,  by 
dividing  an  image  into  homogeneous  superpixels.  Without 
loss  of  generality,  we  use  the  SLIC  segmentation  algorith- 
m  [1].  We  divide  each  image  into  around  150  coherent  su¬ 
perpixels.  The  details  and  effects  of  segmentation  will  be 
discussed  in  Section  5.1.  Note  that  the  solution  is  general 
and  not  tied  to  any  specific  image  segmentation  algorithm. 

Following  many  coding  methods  [22,  36,  18,  25,  15], 
LRSC  densely  samples  SIFT  features  in  an  image.  Each 
region  contains  n  local  features,  whose  observations  (SIFT 
descriptors)  are  concatentated  in  matrix  form  as:  X  = 
[xi,x2,  •  •  •  ,xn].  Each  column  is  a  local  feature  point  in 
Rd,  where  d  =  128  usually.  Given  a  codebook,  D  = 

[di,  d2,  •  •  •  ,  dmJ ,  in  the  noiseless  case,  each  local  feature 

iq  is  represented  as  a  linear  combination  z \  of  elements 
forming  the  codebook  D,  such  that  X  =  DZ. 

We  base  the  formulation  of  LRSC  on  the  following  obser¬ 
vations.  (a)  Because  features  are  densely  sampled  in  a  local 
region,  they  tend  to  have  similar  descriptors,  as  exemplified 
in  Figure  1 .  Consequently,  their  representations  w.r.t.  to  D 
should  also  be  similar.  Therefore,  the  resulting  representa¬ 
tion  matrix  Z  is  expected  to  be  low-rank.  More  formally,  we 
see  that  since  D  is  an  overcomplete  full  rank  matrix,  then 
rank(DZ)  is  equal  to  rank(Z).  Therefore,  if  rank(X)  is  low 
(as  shown  in  Figure  1),  then  rank(Z)  should  also  be  low  too 
(as  shown  in  Figure  2).  (b)  For  an  overcomplete  dictionary 
D,  linear  feature  representations  w.r.t.  D  tend  to  be  sparse. 
In  other  words,  only  a  few  elements  of  D  are  required  to 
reliably  represent  a  local  feature  iq  or  equivalently  only  a 
few  nonzero  coefficients  exist  in  its  representation  z \ .  In  fac- 
t,  sparse  feature  coding  has  been  shown  to  be  quite  helpful 
in  image  classification  [36,  26,  18].  We  combine  (a)  and 
(b)  to  formulate  the  problem  mathematically  in  Eq  1 ,  whose 
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Figure  2.  A  feature  coding  example  in  a  local  region,  (a)  Image  partition 
results;  (b)  All  SIFT  descriptors  in  the  local  region  depicted  in  red  in  (a); 
(c)  and  (d)  are  coding  results  produced  by  SCSPM  [36]  and  LRSC.  From 
(c),  we  see  that  the  local  features  have  inconsistent  codes,  i.e.  their  features 
are  similar  but  their  codes  and  the  supports  of  their  codes  are  not.  This  is 
because  SCSPM  solves  the  coding  problem  for  each  feature  independent¬ 
ly.  However,  the  codes  learnt  by  LRSC  are  jointly  sparse,  i.e.  a  few  (but 
the  same)  visual  words  are  used  to  represent  all  the  local  features  together, 
which  renders  the  codes  consistent  and  more  robust  to  noise.  The  dictionary 
is  obtained  by  using  the  locality  property. 


solution  is  described  in  Section  4.  The  nuclear  norm  ||Z||* 
and  the  sparsity  inducing  I\  norm  ||Z||i?i  =  Y^i=i  II ^ 111  are 
convex  approximations  to  the  rank  function  and  £0  norm,  re¬ 
spectively.  Ai  and  A2  quantify  the  tradeoff  between  sparsity 
and  low-rankness  in  the  feature  codes. 


mmI||X-DZ||2F  +  A1||ZL  +  A2||Z||M  (1) 

3.2.  Discussion 

As  stated  earlier,  many  feature  coding  schemes  exist  in  the 
literature.  In  HC  [22]  and  SC*  [33],  different  voting  schemes 
are  adopted  to  obtain  z \  for  each  local  feature.  SCSPM  [36] 
improves  upon  these  methods  by  enforcing  sparsity  in  z*. 
However,  solving  an  £i  problem  for  each  local  feature  inde¬ 
pendently  is  computationally  expensive,  especially  for  large 
codebooks.  In  LLC  [18],  LSC  [25],  and  SC  [15],  locality  in 
codebook  selection  is  adopted  and  better  performance  is  ob¬ 
tained.  In  LScSPM  [11],  a  global  similarity  among  features 
is  adopted  to  consider  the  relationship  among  feature  points 
in  feature  space.  However,  it  incurs  a  significantly  high  com¬ 
putation  cost  and  ignores  spatial  relationships  between  fea¬ 
tures  in  the  same  image.  Recently,  spatial  consistency  has 
been  successfully  adopted  by  LCSRC  [31]  for  feature  cod¬ 
ing.  It  ignores  the  fact  that  features  in  a  local  region  not  on¬ 
ly  have  similar  codebooks,  but  also  similar  representation- 
s.  Clearly,  features  with  similar  bases  coded  independent¬ 
ly  may  have  different  representations.  The  LRSC  method 
we  propose  here  is  aimed  at  simultaneously  achieving  all 
three  properties  for  image  classification:  sparsity,  locality, 
and  spatial  consistency.  In  Figure  2,  we  show  an  example 
of  how  LRSC  compares  with  traditional  sparse  coding  (SC¬ 
SPM)  [36].  Clearly,  the  columns  of  Z  generated  by  LRSC 
are  jointly  sparse,  i.e.  a  few  (but  the  same)  visual  words  are 
used  to  represent  all  the  local  features.  This  exemplifies  how 
both  the  sparsity  and  low-rank  properties  are  satisfied  under 
LRSC.  This  is  not  the  case  for  SCSPM,  which  is  known  to 
produce  inconsistent  codes,  especially  for  features  present  in 
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the  same  spatial  neighborhood.  The  following  three  observa¬ 
tions  explain  how  LRSC  is  related  to  other  coding  schemes. 

•  Sparsity:  When  A2  7^  0,  LRSC  leads  to  sparse  codes. 
When  Ai  =  0  (i.e.  spatial  consistency  is  not  consid¬ 
ered),  our  method  degenerates  into  SCSPM. 

•  Locality:  When  encoding  features  in  a  local  region,  we 
adopt  locality  in  selecting  the  codebook  D.  Similar  to 
LLC  and  LSC,  we  construct  D  from  elements  in  the  u- 
niversal  codebook  that  are  nearest  to  each  local  feature. 

•  Spatial  Consistency:  By  enforcing  the  low-rank  proper¬ 
ty  in  a  local  region,  spatial  information  is  encoded  and 
features  are  constrained  to  have  similar  codes.  In  com¬ 
parison,  LCSRC  [31]  incorporates  spatial  consistency 
in  selecting  an  ‘optimal’  codebook  for  each  feature  sep¬ 
arately  and  then  computes  feature  codes  independently 
in  the  image.  Using  this  method,  there  is  no  direct  guar¬ 
antee  that  features  in  a  local  region  have  similar  codes. 

4.  Optimization 

In  Eq  (1),  the  cost  function  has  two  convex  and  non¬ 
smooth  regularizers  (sparse  1 1 . 1 1 1  regularizer  or  low-rank  1 1 . 1 1  * 
regularizer),  which  makes  solving  it  efficiently  non-trivial. 
In  order  to  handle  these  two  regularizers  independently,  we 
introduce  two  slack  variables  and  add  two  equality  con¬ 
straints  as  in  Eq  (2). 

min  I||X-DZ3||2f  +  A1||Z1||%+A2||Z2||1i1  (2) 

such  that:  Z 3  =  Z\\  Z3  =  Z2 


L{ Zi_3)  =  \  ||X  -  DZ3||f  +  AjZiH,  +  A2||Z2||1)1 

+*r[Yf(Z3-Z1)]+^||Z3-Z1||2F 

+tr[ Y2T(Z3-Z2)] +^||Z3-Z2||^  (5) 

Y 1  and  Y2  are  lagrange  multipliers,  and  u\  >  0  and  U2  >  0 
are  two  penalty  parameters.  The  above  problem  can  by 
minimized  by  either  exact  or  inexact  ALM  algorithms  [24]. 
For  efficiency,  we  choose  the  inexact  ALM.  Its  convergence 
properties  can  be  proven  similar  to  those  in  [24].  In  fact, 
IALM  is  an  iterative  algorithm  that  solves  for  each  variable 
in  a  blockwise  coordinate  descent  fashion.  In  other  word- 
s,  each  iteration  of  IALM  involves  updating  each  variable 
one-at-a-time,  with  the  other  variables  fixed  to  their  most  re¬ 
cent  values.  Consequently,  we  obtain  four  update  steps  cor¬ 
responding  to  the  four  sets  of  variables  we  need  to  optimize 
for.  Note  that  all  steps  have  closed  form  solutions. 

Step  1:  [Update  Zi]  This  requires  solving  the  following 
problem  as  shown  in  Eq  (6). 


zi  =  argminL  HZjII^  +  l||Zi  - 

Zi  ui  z 


(Z3  + 


2 

F 


(6) 


Step  2:  [Update  Z2]  This  is  done  by  solving  Eq  (7). 


Z2  =  argmin  L  ||Z2||  +  i||Z2  -  (Z3  +  —  y2)|||. 

Z2  u2  Z  U2 


z$  =  s^(z3  +  £y2) 


(7) 


This  transformed  problem  can  be  minimized  using 
the  conventional  Inexact  Augmented  Lagrange  Multiplier 
(IALM)  method  that  has  attractive  quadratic  convergence 
properties  and  is  extensively  used  in  matrix  rank  minimiza¬ 
tion  problems  [29].  IALM  is  an  iterative  method  that  aug¬ 
ments  the  Lagrangian  function  with  quadratic  penalty  terms. 
This  allows  closed  form  updates  for  each  of  the  variables. 
The  updates  are  closed  form  due  to  the  identities  in  Eq 
(3,4),  where  S\(Aij)  =  sign(A^)  max(0,  |A^  |  —  A)  is  the 
soft- thresholding  operator  and  J\  (A)  =  Ua$\  (£a)  Vj 
is  the  singular  value  soft-thresholding  operator  with  A  = 
JJa^a^a  being  the  SVD  of  A. 

X*  =  arg min  ||X  -  A||F  +  2A||X||i,i  =  <Sa(A)  (3) 

X*=  argmin  ||X  -  A||F  +  2A||X||*  =  J\(A)  (4) 

By  introducing  augmented  lagrange  multipliers  to  incor¬ 
porate  the  equality  constraints  into  the  cost  function,  we  ob¬ 
tain  the  Lagrangian  function  in  Eq  (5)  that  we  show,  in  what 
follows,  can  be  minimized  through  a  sequence  of  simple 
closed  form  update  operations. 


Step  3:  [Update  Z3]  This  is  done  by  solving  Eq  (8), 
whose  solution  is  shown  in  Eq  (9). 

Z*3  =  argmin  1  ||X  -  DZ3||^  +  ir[Y*(Z3  -  Z3] 
z3  z 

+  ^  || Z3  -  Z\  ||F  +  tr[ Y*(Z3  -  Z2)\  +  ^  || Z3  -  Z2 1| 

(8) 


Z\  =  (DtD  +  Mil  +  u2l)  XG 


(9) 


2 

F 


where  G  —  D"^X  —  Y^  —  Y2  T  u\Z\  T  u,2Z2. 

Step  4:  [Update  Multipliers]  They  are  updated  in  Eq 
(10),  where  p  >  1  is  a  user-defined  constant. 

f' Yi  =  Yi  +  iXr(Z3  -  Zi);  Y2  =  Y2  +  ^2(Z3  -  Z2) 

\^1  =  pu  i;  ^2  =  pu2 

(10) 

Computational  Complexity:  The  convergence  of  IALM 
algorithm  is  reached  when  the  change  in  objective  function 
or  solution  Z  is  below  a  user-defined  threshold  e  =  10-3. 
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Empirically,  we  find  that  our  IALM  algorithm  is  insensitive 
to  a  large  range  of  e  values.  In  our  implementation,  u\  =  U2. 
The  computational  bottleneck  of  LRSC  lies  in  the  SVD  of 
matrix  Z  in  Step  1.  Since  Z  is  low-rank  and  rectangular, 
its  SVD  can  be  computed  efficiently  with  time  complexity 
O  ( mnr ),  where  r  is  its  rank  such  that  r  <  y/min(m,  n). 
Because  r  is  usually  small  compared  to  m  and  n  and  the 
matrix  inversion  in  Step  3  can  be  done  by  the  eigenvalue  de¬ 
composition  of  DtD  only  one  time  at  the  start  of  the  opti¬ 
mization,  the  overall  computational  complexity  of  LRSC  is 
O  (m3  +  mne-0,5),  where  the  number  of  IALM  iterations 
is  O  (e-0,5).  In  comparison,  SCSPM  solves  n  I\  minimiza¬ 
tion  (LASSO)  problems  independently  and  thus  has  a  time 
complexity  of  0(m2nd ),  which  is  signficantly  slower  than 
our  coding  method.  In  practise,  we  observe  that  LRSC  is 
usually  about  4  times  faster  than  SCSPM. 

5.  Experimental  Results 

In  this  section,  we  experimentally  assess  the  generality  of 
our  LRSC  method  by  evaluating  its  performance  on  two  d- 
ifferent  tasks:  scene  classification  and  objection  recognition. 
The  effectiveness  and  efficiency  of  LRSC  are  validated  by  a 
comparison  with  7  popular  coding  methods  and  other  state- 
of-the-art  approaches  where  applicable. 

Datasets:  Lor  the  two  tasks,  LRSC  is  evaluated  on  four 
well  known  benchmarks,  intensively  used  in  the  literature: 
Scene-13  [10],  Caltech-101  [9],  Caltech-256  [13],  andUIUC 
8-Sport  [23]. 

Baseline  Methods:  We  compare  LRSC  to  two  types  of 
image  classification  methods  in  the  literature:  (1)  methods 
relevant  to  feature  coding  that  use  the  same  BoW  pipeline 
for  image  classification  but  only  differ  in  how  coding  is  per¬ 
formed  and  (2)  other  well-known  classification  methods  that 
do  not  necessarily  conform  to  the  BoW  pipeline.  Direct  e- 
valuation  of  LRSC  is  made  by  comparing  it  to  methods  of 
type  (1),  since  all  stages  of  the  BoW  pipeline  (e.g.  fea¬ 
ture  extraction  and  classification)  are  kept  the  same  and  only 
the  coding  stage  is  varied.  Lor  completeness,  we  compare 
the  performance  of  LRSC  against  that  of  type  (2)  method- 
s,  even  though  the  feature,  representation,  and  classification 
schemes  used  there  are  quite  different.  We  include  7  recent 
and  state-of-the-art  type  (1)  methods,  which  are  denoted  as: 
HC  [22],  LSC  [25],  SCSPM  [36],  LLC  [18],  LScSPM  [11], 
SC  [15],  and  LCSRC  [31].  On  UIUC  8-Sport,  Scene-15, 
and  Caltech- 101,  the  baseline  results  for  these  methods  are 
borrowed  from  [31].  On  the  other  data  sets,  we  implement 
these  methods  using  publicly  available  source  codes  or  bina¬ 
ries  provided  by  the  authors  and  run  them  with  default  pa¬ 
rameters.  Lor  each  dataset,  we  also  include  state-of-the-art 
type  (2)  methods  and  report  their  results. 

Implementation  Details:  Lor  fair  comparison  with  type 
(1)  methods,  we  fix  all  stages  of  the  BoW  classification 
pipeline  except  for  the  feature  coding  stage.  As  reference,  we 
follow  the  experimental  setup  in  [3 1]  for  all  our  experiments. 


Table  1.  Classification  accuracies  on  the  UIUC  8-Sport  data  set. 


Methods 

Accuracies  (%) 

Methods 

Accuracies  (%) 

HC  [22] 
SCSPM  [36] 
LLC  [18] 
LSC  [25] 

79.98  ±1.67 

82.74  ±  1.46 

81.77  ±  1.51 
82.79  ±2.01 

LScSPM  [11] 
SC  [15] 
LCSRC  [31] 
LRSC 

85.31  ±0.51 
85.44  ±  1.54 

87.23  ±  1.14 

88.17  ±  0.85 

Lor  completeness,  we  briefly  describe  this  setup  next.  (1)  Im¬ 
age  resize:  Similar  to  previous  methods  [36,  25,  18,  11,31], 
images  are  downsized  to  no  more  than  300  x  300  pixels  for 
Scene-13,  Caltech-101,  and  Caltech-256,  and  400  x  400  pix¬ 
els  for  UIUC  8-Sport,  respectively.  (2)  Dense  local  features : 
SILT  descriptors  [27]  with  dimension  d  =  128  are  extract¬ 
ed  from  16  x  16  pixel  patches  densely  sampled  from  each 
image  on  a  grid  with  a  4  pixel  stepsize.  (3 )  Codebook:  The 
universal  codebook  is  obtained  using  K-means  on  a  random¬ 
ly  selected  subset  of  SILT  descriptors  (20070  in  the  training 
set.  As  in  [31],  the  codebook  size  depends  on  the  size  of 
the  dataset:  1024  for  Scene-13,  Caltech-101,  and  UIUC  8- 
Sport  and  4096  for  Caltech-256.  As  discussed  in  [36,  18,  6], 
increasing  the  codebook  size  can  improve  the  performance. 
Due  to  the  locality  property  of  the  dictionary  discussed  in 
Section  3.2,  our  algorithm  will  incur  a  slightly  higher  com¬ 
putational  cost  to  find  the  nearest  neighbors  in  codebook  for 
each  feature  point.  Therefore,  our  algorithm  can  retain  a 
good  performance  level  even  for  large  codebook  sizes.  Lor 
a  fair  comparison,  we  adopt  the  setup  of  [31].  (4)  Local  re¬ 
gion:  SLIC  segmentation  [1]  is  adopted  to  segment  images 
into  multiple  superpixels.  SLIC  has  three  parameters:  Min- 
RegionSize,  regionSize,  and  regularizer,  which  are  set  to  be 
100,  24,  and  1,  respectively.  The  details  are  discussed  in 
Section  5.1.  (5)  Pooling:  Max-pooling  is  performed.  To 
include  spatial  layout  information,  SPM  [22]  with  3  levels: 
1  x  1  ,  2  x  2  and  4  x  4  is  adopted.  The  weight  for  each 
layel  is  the  same.  (6)  Classifier:  a  one-vs-all  linear  SVM 
classifier  is  used,  since  it  has  been  shown  to  achieve  state- 
of-the-art  classification  performance  when  paired  with  max¬ 
pooling  [36,  25,  18,  31]. 

5.1.  UIUC  8-Sport  Data  Set 

UIUC  8-Sport  [23]  contains  1792  images  and  8  cate¬ 
gories  for  image-based  event  classification.  These  8  cate¬ 
gories  are  badminton,  bocce,  croquet,  polo,  rock  climbing, 
rowing,  sailing  and  snow  boarding,  and  the  size  of  each  cat¬ 
egory  ranges  from  137  to  250.  Lollowing  the  standard  set¬ 
ting  for  this  data  set,  we  use  10  random  splits  of  the  data, 
we  randomly  select  70  training  images  and  60  test  images 
for  each  category.  The  classification  accuracy  is  reported  in 
Table  1,  which  shows  the  average  (and  standard  deviation) 
results  of  the  state-of-the-art  coding  approaches  and  the  pro¬ 
posed  LRSC  method.  As  we  can  see,  SCSPM  is  much  better 
than  the  classic  HC  method,  which  shows  sparsity  is  helpful 
for  image  classification.  In  LLC,  LSC,  and  SC,  adding  local¬ 
ity  can  also  improve  the  classification  accuracy.  Results  of 
LScSPM  show  that  the  relationships  among  features  in  their 
d— dimensional  feature  space  improves  classification  further. 
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Table  2.  The  influence  of  image  partition  to  LRSC. 


Partition 

20  x  20 

30  x  30 

40  x  40 

Accuracies  (%) 

86.2  ±  0.81 

88.17  ±  0.85 

87.3  ±  0.91 

In  LCSRC  and  our  LRSC,  adding  local  spatial  information 
improves  classification  accuracy  significantly  as  compared 
to  the  first  six  methods,  and  our  LRSC  has  a  moderate  im¬ 
provement  over  the  state-of-the-art  feature  coding  methods. 
Compared  with  other  state-of-the-art  non-coding  methods  as 
shown  in  Table  1,  our  LRSC  is  much  better  than  GIST  [28] 
(63.88),  [19]  (83.54±1.13),  [23]  (73.4),  [30]  (86.25),  and  has 
about  2%  improvement  compared  with  the  latest  work  [30]. 

Since  our  LRSC  is  used  to  encode  features  in  local  regions 
(superpixels),  we  now  study  the  influence  of  image  partition 
on  LRSC  by  comparing  its  performance  under  different  par¬ 
tition  settings.  In  this  work,  we  adopt  the  SLIC  approach 
[1]  to  segment  images  into  multiple  homogeneous  patches 
(superpixels).  As  for  SLIC  parameters,  MinRegionSize  and 
regularizer  are  set  to  100  and  1,  respectively.  For  compar¬ 
ison,  we  vary  regionSize  (nominal  size  of  the  superpixels) 
between  three  values:  20,  30,  and  40.  The  corresponding  re¬ 
sults  are  reported  in  Table  2.  There  is  only  a  slight  difference 
among  the  three  settings.  Following  the  best  result,  we  set 
regionSize  to  30.  Clearly,  optimizing  the  selection  of  parti¬ 
tion  parameters  for  each  task  and  dataset  will  improve  LRSC 
performance.  However,  we  leave  this  for  future  work. 

In  Table  3,  the  runtime  for  all  coding  methods  on  the  same 
image  is  reported.  For  one  300  x  400  image  with  108  seg¬ 
ments,  6984  SIFT  descriptors  are  extracted.  When  all  fea¬ 
tures  are  coded  with  a  1024  codebook,  LRSC  is  computa¬ 
tionally  much  faster  than  SCSPM  [36]  because  our  LRSC 
encodes  local  features  jointly,  which  is  much  more  efficient 
than  SCSPM  encoding  features  independently  (6984  mini¬ 
mization  problems).  LRSC  is  also  comparable  with  HC  [22], 
SC  [15],  LLC  [18],  and  LSC  [25],  which  do  not  perform 
expensive  optimization  operations.  We  could  not  compare 
against  the  runtime  of  LScSPM  and  LCSRC  because  their 
source  codes  were  not  available.  But,  we  expect  LRSC  to 
be  faster,  since  LCSRC  need  solve  an  expensive  multi-label 
MRF  problem.  All  experiments  are  done  using  MATLAB  on 
a  2.66GHZ  Intel  Core2  Duo  PC  with  18GB  RAM. 

5.2.  Scene- 13  Data  Set 

Scene- 13  [10]  consists  of  3859  images  each  belonging  to 
one  of  13  categories,  which  contain  200  to  400  images  each. 
The  categories  vary  from  outdoor  scenes  like  mountain  and 
forest  to  indoor  environments  like  living  room  and  kitchen. 
Following  the  standard  setup,  we  use  10  random  splits  of 
the  data,  while  considering  100  random  images  per  class  for 

Table  3.  Runtime  of  different  coding  methods  on  a  300  X  400  image  with  6984 
SIFT  descriptors.  The  codebook  is  1024,  and  the  number  of  superpixels  is  108. 


Methods 

Time  (seconds) 

Methods 

Time  (seconds) 

HC  [22] 

1.66 

LScSPM  [11] 

- 

SCSPM  [36] 

8.27 

SC  [15] 

1.73 

LLC  [18] 

1.65 

LCSRC  [31] 

- 

LSC  [25] 

1.71 

LRSC 

2.33 

Table  4.  Classification  accuracies  on  the  Scene- 13  data  set. 


Methods 

Accuracies  (%) 

Methods 

Accuracies  (%) 

HC  [22] 
SCSPM  [36] 
LLC  [18] 
LSC  [25] 

77.20  d=  0.41 
83.14  ±0.45 

83.25  ±0.36 
83.33  ±  0.44 

LScSPM  [11] 
SC  [15] 
LCSRC  [31] 
LRSC 

82.11  ±0.34 

85.13  ±  0.53 

training  and  the  rest  for  testing.  The  comparative  results  are 
shown  in  Table  4.  LRSC  performs  best  among  all  the  feature 
coding  methods  and  has  about  2%  improvement.  Compared 
with  other  state-of-the-art  methods  [19,  10,  4],  LRSC  per¬ 
forms  much  better  ([19]  (83.54  ±  1.13),  [10]  (65.2)  and  has 
about  2%  improvement. 

5.3.  Caltech-101  Data  Set 

Caltech- 101  [9]  contains  9144  images  in  101  classes  in¬ 
cluding  animals,  vehicles,  flowers,  etc,  with  high  shape  vari¬ 
ability.  The  number  of  images  per  category  varies  from  31 
to  800.  Following  the  standard  experimental  setting,  we  use 
10  random  splits  of  the  data,  while  considering  30  random 
images  per  class  for  training  and  the  rest  for  testing.  The  av¬ 
erage  classification  rates  are  reported  in  Table  5.  From  these 
results,  we  see  that  LRSC  performs  best  among  the  existing 
methods.  As  compared  to  the  sparse  coding  methods  SC¬ 
SPM  and  LLC,  LRSC’s  performance  is  much  better,  since 
it  makes  a  3%  improvement.  It  also  registers  about  2%  im¬ 
provement  over  LCSRC.  As  such,  we  conclude  that  exploit¬ 
ing  spatial  consistency  directly  in  the  coding  stage  improves 
classification  performances  by  3%  on  average.  We  could  not 
compare  against  LScSPM  because  its  source  code  was  not 
available.  We  also  compare  our  results  to  the  state  of  art 
using  one  type  of  descriptors  on  Caltech- 101.  Our  LRSC 
is  better  than  [3]  (70.4),  [17]  (69.6),  [38]  (66.2  ±  0.5),  [5] 
(75.1  ±  0.9),  and  is  comparable  to  [5]  (75.7  ±  1.1),  which 
adopts  kernel  SVM  as  the  classifier.  In  [6]  (77.3  ±  0.6), 
and  [8]  (80.3  d=  1.2  ),  they  adopt  macrofeatures,  cross- 
validation  to  tune  parameters,  and  kernel  SVM,  respective¬ 
ly,  and  show  much  better  performance  than  our  LRSC.  Note 
that  better  performance  has  been  reported  with  multiple  de¬ 
scriptor  types  (e.g.,  methods  using  multiple  kernel  learning 
have  achieved  77.7%  ±  0.3  [12],  78.0%  ±  0.3  [14,  34],  and 
84.3%  [35]),  or  subcategory  learning  (83%  [32]). 

5.4.  Caltech-256  Data  Set 

Caltech-256  [13]  contains  256  categories  as  well  as  a 
background  class.  The  number  of  images  is  29780  with 
much  higher  intra-class  variability  and  higher  object  location 
variability  as  compared  to  Caltech- 101,  in  which  the  objects 
are  often  in  the  center  of  image.  Clearly,  Caltech-256  is  a 
very  challenging  data  set  for  object  recognition.  Following 
Table  5.  Classification  accuracies  on  the  Caltech- 101  data  set. 


Methods 

Accuracies  (%) 

Methods 

Accuracies  (%) 

HC  [22] 
SCSPM  [36] 
LLC  [18] 
LSC  [25] 

69.43  ±  0.52 
72.20  ±1.30 
71.67  ±0.86 

72.58  ±  1.08 

LScSPM  [11] 
SC  [15] 
LCSRC  [31] 
LRSC 

69.55  ±0.83 
73.23  ±0.81 

75.02  ±  0.74 
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Table  6.  Classification  accuracies  on  the  Caltech-256  data  set. 


Methods 

Accuracies  (%) 

Methods 

Accuracies  (%) 

HC  [22] 

21.82  ±0.22 

LScSPM  [11] 

35.74  ±0.10 

SCSPM  [36] 

34.02  ±  0.35 

SC  [15] 

34.60  ±  0.27 

LLC  [18] 

37.41  ±0.21 

LCSRC  [31] 

- 

LSC  [25] 

38.15  ±0.26 

LRSC 

41.04  ±  0.23 

the  standard  experimental  setting,  we  use  10  random  splits 
of  the  data,  while  considering  30  random  images  per  class 
for  training  and  the  rest  for  testing  and  list  the  average  clas¬ 
sification  rates  in  Table  6.  From  this  table,  we  see  that  our 
LRSC  method  outperforms  the  other  coding  methods  on  this 
data  set,  and  makes  about  3%  improvement.  Compared  with 
other  state-of-the-art  methods,  our  LRSC  is  also  much  better 
than  [21]  (36.3),  [8]  (38.1  zb  0.6)  and  [3](1  desc)  (37.0),  and 
is  is  also  comparable  to  Boureau  et  al.  [6]  (41.7  zb  0.8)  with 
macrofeatures  and  cross-validation.  In  addition,  NBNN  (5 
desc)  [3]  (42.0)  and  Todorovic  et  al.  [32]  (49.5)  show  much 
better  performance  due  to  the  use  of  multiple  features. 

6.  Conclusion 

In  this  paper,  we  present  a  new  coding  technique  for  local 
features  that  employs  low-rank  sparse  learning.  This  method 
exploits  sparsity  in  individual  codes,  locality  in  codebook  s- 
election,  and  low-rankness  in  constraining  sparse  codes  be¬ 
longing  to  the  same  spatial  neighborhood.  Although  the  se¬ 
lection  of  spatial  neighborhoods  (superpixels)  might  not  be 
optimal,  our  extensive  results  show  that  our  method  improves 
upon  the  state-of-the-art  and  increases  classification  accura¬ 
cy  on  several  benchmarks.  For  future  work,  we  will  sys¬ 
tematically  study  how  image  partition  can  be  combined  with 
low-rank  sparse  coding  in  one  unified  framework. 
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